Skip to content

Conversation

@sergical
Copy link
Member

@sergical sergical commented Oct 25, 2025

DESCRIBE YOUR PR

  • Caches release registry
  • looking into md exports

IS YOUR CHANGE URGENT?

Help us prioritize incoming PRs by letting us know when the change needs to go live.

  • Urgent deadline (GA date, etc.):
  • Other deadline:
  • None: Not urgent, can wait up to 1 week+

SLA

  • Teamwork makes the dream work, so please add a reviewer to your PRs.
  • Please give the docs team up to 1 week to review your PR unless you've added an urgent due date to it.
    Thanks in advance for your help!

PRE-MERGE CHECKLIST

Make sure you've checked the following before merging your changes:

  • Checked Vercel preview for correctness, including links
  • PR was reviewed and approved by any necessary SMEs (subject matter experts)
  • PR was reviewed and approved by a member of the Sentry docs team

LEGAL BOILERPLATE

Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. and is gonna need some rights from me in order to utilize my contributions in this here PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.

EXTRA RESOURCES

@vercel
Copy link

vercel bot commented Oct 25, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
develop-docs Ready Ready Preview Comment Oct 29, 2025 4:02pm
sentry-docs Ready Ready Preview Comment Oct 29, 2025 4:02pm

@codecov
Copy link

codecov bot commented Oct 25, 2025

Bundle Report

Changes will decrease total bundle size by 460.9kB (-1.96%) ⬇️. This is within the configured threshold ✅

Detailed changes
Bundle name Size Change
sentry-docs-client-array-push 10.16MB -6 bytes (-0.0%) ⬇️
sentry-docs-server-cjs 12.52MB -460.9kB (-3.55%) ⬇️

Affected Assets, Files, and Routes:

view changes for bundle: sentry-docs-client-array-push

Assets Changed:

Asset Name Size Change Total Size Change (%)
static/chunks/pages/_app-*.js -3 bytes 882.71kB -0.0%
static/chunks/8321-*.js -3 bytes 425.87kB -0.0%
server/middleware-*.js 6.46kB 7.46kB 645.5% ⚠️
server/middleware-*.js -6.46kB 1.0kB -86.59%
static/fqMK9BHK1nHyXvt1MK9Ok/_buildManifest.js (New) 684 bytes 684 bytes 100.0% 🚀
static/fqMK9BHK1nHyXvt1MK9Ok/_ssgManifest.js (New) 77 bytes 77 bytes 100.0% 🚀
static/c-*.js (Deleted) -77 bytes 0 bytes -100.0% 🗑️
static/c-*.js (Deleted) -684 bytes 0 bytes -100.0% 🗑️
view changes for bundle: sentry-docs-server-cjs

Assets Changed:

Asset Name Size Change Total Size Change (%)
1729.js -33.51kB 1.74MB -1.89%
../instrumentation.js -33.8kB 1.07MB -3.07%
9523.js -33.51kB 1.04MB -3.11%
../app/[[...path]]/page.js.nft.json -119.93kB 739.55kB -13.95%
../app/platform-redirect/page.js.nft.json -119.93kB 739.46kB -13.96%
../app/sitemap.xml/route.js.nft.json -119.93kB 736.69kB -14.0%
7153.js (New) 30.3kB 30.3kB 100.0% 🚀
9567.js 924 bytes 23.11kB 4.17%
../app/api/ip-ranges/route.js -300 bytes 5.79kB -4.92%
../app/robots.txt/route.js -300 bytes 5.02kB -5.64%
2311.js (Deleted) -30.9kB 0 bytes -100.0% 🗑️

Files in 9567.js:

  • ./src/mdx.ts → Total Size: 27.86kB

App Routes Affected:

App Route Size Change Total Size Change (%)
/ -600 bytes 2.81MB -0.02%

- Use VERCEL_GIT_COMMIT_REF (branch name) in cache keys for cross-commit persistence
- Include registry data hash in cache key to detect registry updates
- Enable caching for 200+ platform-include files (previously skipped)
- Add build timing instrumentation
- Expected: 18 min → 2-3 min on first build, ~2 min on subsequent commits
@sergical sergical changed the title feat(Vercel) dont generate md exports on preview builds feat(Vercel) Build cache improvements Oct 27, 2025
@sergical sergical marked this pull request as ready for review October 28, 2025 20:02
cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

Comment on lines +232 to +234
const leanHTML = rawHTML
// Remove all script tags (build IDs, chunk hashes, Vercel injections)
.replace(/<script[^>]*>[\s\S]*?<\/script>/gi, '')

Check failure

Code scanning / CodeQL

Incomplete multi-character sanitization High

This string may still contain
<script
, which may cause an HTML element injection vulnerability.
// Remove elements that change between builds but don't affect markdown output
const leanHTML = rawHTML
// Remove all script tags (build IDs, chunk hashes, Vercel injections)
.replace(/<script[^>]*>[\s\S]*?<\/script>/gi, '')

Check failure

Code scanning / CodeQL

Bad HTML filtering regexp High

This regular expression does not match script end tags like </script >.

Copilot Autofix

AI 3 days ago

The best way to fix this problem is to use a proper HTML parser to remove unwanted tags (such as <script>, <link>, and <meta>), rather than relying on regular expressions. This provides more robust handling of HTML's intricacies, such as extra whitespace, unusual attribute formatting, and invalid but tolerated browser syntax. Since the script already imports rehype-parse (for parsing HTML to a syntax tree) and other tools from the unified/rehype ecosystem, the fix can use these existing libraries.

Specifically, instead of using .replace(/<script[^>]*>[\s\S]*?<\/script>/gi, '') (and similar regex for <link> and <meta>), we should parse the HTML into an AST, programmatically remove the unwanted nodes, and then serialize the AST back to HTML for further processing. This fix should be applied within the genMDFromHTML function, replacing the leanHTML construction (lines 233–242) with parser-based routines.

No new dependencies are needed since rehype-parse, unist-util-remove, and related packages are already imported. We'll need to use unified().use(rehypeParse, {fragment: true}) to parse the HTML, use remove(tree, test) from unist-util-remove to strip undesired nodes, and a rehype serializer (e.g., rehype-stringify) to convert the AST back to HTML. If not already available, we should add a rehype-stringify import.


Suggested changeset 1
scripts/generate-md-exports.mjs

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/scripts/generate-md-exports.mjs b/scripts/generate-md-exports.mjs
--- a/scripts/generate-md-exports.mjs
+++ b/scripts/generate-md-exports.mjs
@@ -26,6 +26,7 @@
 import remarkStringify from 'remark-stringify';
 import {unified} from 'unified';
 import {remove} from 'unist-util-remove';
+import rehypeStringify from 'rehype-stringify';
 
 const DOCS_ORIGIN = 'https://docs.sentry.io';
 const CACHE_VERSION = 3;
@@ -230,17 +231,44 @@
 
   // Normalize HTML to make cache keys deterministic across builds
   // Remove elements that change between builds but don't affect markdown output
-  const leanHTML = rawHTML
-    // Remove all script tags (build IDs, chunk hashes, Vercel injections)
-    .replace(/<script[^>]*>[\s\S]*?<\/script>/gi, '')
-    // Remove link tags for stylesheets and preloads (chunk hashes change)
-    .replace(/<link[^>]*>/gi, '')
-    // Remove meta tags that might have build-specific content
-    .replace(/<meta name="next-size-adjust"[^>]*>/gi, '')
-    // Remove data attributes that Next.js/Vercel add (build IDs, etc.)
-    .replace(/\s+data-next-[a-z-]+="[^"]*"/gi, '')
-    .replace(/\s+data-nextjs-[a-z-]+="[^"]*"/gi, '');
+  // Remove all <script>, <link>, and next-size-adjust <meta> tags, as well as data-* attributes, using an HTML parser.
+  const parsedHtmlTree = unified()
+    .use(rehypeParse, {fragment: true})
+    .parse(rawHTML);
 
+  // Remove unwanted elements using unist-util-remove
+  // Remove <script> tags
+  remove(parsedHtmlTree, (node) => node.type === 'element' && node.tagName === 'script');
+  // Remove <link> tags
+  remove(parsedHtmlTree, (node) => node.type === 'element' && node.tagName === 'link');
+  // Remove <meta name="next-size-adjust" ...>
+  remove(parsedHtmlTree, (node) =>
+    node.type === 'element' &&
+    node.tagName === 'meta' &&
+    node.properties &&
+    node.properties.name === 'next-size-adjust'
+  );
+  // Remove data-next-* and data-nextjs-* attributes from all elements
+  function cleanseDataAttrs(node) {
+    if (node && node.type === 'element' && node.properties) {
+      Object.keys(node.properties).forEach((key) => {
+        if (/^data-next(-|js-)/.test(key)) {
+          delete node.properties[key];
+        }
+      });
+    }
+    if (node.children) {
+      node.children.forEach(cleanseDataAttrs);
+    }
+  }
+  cleanseDataAttrs(parsedHtmlTree);
+
+  // Convert AST back to HTML
+  const leanHTML = unified()
+    .use(() => (tree) => tree) // identity plugin since tree already processed
+    .use(rehypeStringify)
+    .stringify(parsedHtmlTree);
+
   if (shouldDebug) {
     console.log(
       `✂️  Lean HTML length: ${leanHTML.length} chars (removed ${rawHTML.length - leanHTML.length} chars)`
EOF
@@ -26,6 +26,7 @@
import remarkStringify from 'remark-stringify';
import {unified} from 'unified';
import {remove} from 'unist-util-remove';
import rehypeStringify from 'rehype-stringify';

const DOCS_ORIGIN = 'https://docs.sentry.io';
const CACHE_VERSION = 3;
@@ -230,17 +231,44 @@

// Normalize HTML to make cache keys deterministic across builds
// Remove elements that change between builds but don't affect markdown output
const leanHTML = rawHTML
// Remove all script tags (build IDs, chunk hashes, Vercel injections)
.replace(/<script[^>]*>[\s\S]*?<\/script>/gi, '')
// Remove link tags for stylesheets and preloads (chunk hashes change)
.replace(/<link[^>]*>/gi, '')
// Remove meta tags that might have build-specific content
.replace(/<meta name="next-size-adjust"[^>]*>/gi, '')
// Remove data attributes that Next.js/Vercel add (build IDs, etc.)
.replace(/\s+data-next-[a-z-]+="[^"]*"/gi, '')
.replace(/\s+data-nextjs-[a-z-]+="[^"]*"/gi, '');
// Remove all <script>, <link>, and next-size-adjust <meta> tags, as well as data-* attributes, using an HTML parser.
const parsedHtmlTree = unified()
.use(rehypeParse, {fragment: true})
.parse(rawHTML);

// Remove unwanted elements using unist-util-remove
// Remove <script> tags
remove(parsedHtmlTree, (node) => node.type === 'element' && node.tagName === 'script');
// Remove <link> tags
remove(parsedHtmlTree, (node) => node.type === 'element' && node.tagName === 'link');
// Remove <meta name="next-size-adjust" ...>
remove(parsedHtmlTree, (node) =>
node.type === 'element' &&
node.tagName === 'meta' &&
node.properties &&
node.properties.name === 'next-size-adjust'
);
// Remove data-next-* and data-nextjs-* attributes from all elements
function cleanseDataAttrs(node) {
if (node && node.type === 'element' && node.properties) {
Object.keys(node.properties).forEach((key) => {
if (/^data-next(-|js-)/.test(key)) {
delete node.properties[key];
}
});
}
if (node.children) {
node.children.forEach(cleanseDataAttrs);
}
}
cleanseDataAttrs(parsedHtmlTree);

// Convert AST back to HTML
const leanHTML = unified()
.use(() => (tree) => tree) // identity plugin since tree already processed
.use(rehypeStringify)
.stringify(parsedHtmlTree);

if (shouldDebug) {
console.log(
`✂️ Lean HTML length: ${leanHTML.length} chars (removed ${rawHTML.length - leanHTML.length} chars)`
Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants